Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] baseline forge #15586

Closed
wants to merge 9 commits into from
Closed

Conversation

danielxiangzl
Copy link
Contributor

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

@danielxiangzl danielxiangzl added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Dec 12, 2024
Copy link

trunk-io bot commented Dec 12, 2024

⏱️ 2h 4m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
test-target-determinator 1h 13m 🟩🟩🟩🟩 (+17 more)
rust-images / rust-all 13m 🟩
rust-move-tests 13m 🟩
forge-e2e-test / forge 6m 🟥
forge-e2e-test / forge 5m
file_change_determinator 4m 🟩🟩🟩🟩🟩 (+17 more)
check-dynamic-deps 2m 🟩🟩
rust-cargo-deny 2m 🟩
permission-check 1m 🟩🟩🟩🟩🟩 (+18 more)
permission-check 1m 🟩🟩🟩🟩🟩 (+18 more)
permission-check 1m 🟩🟩🟩🟩🟩 (+18 more)
determine-docker-build-metadata 50s 🟩🟩🟩🟩🟩 (+17 more)
semgrep/ci 48s 🟩🟩
general-lints 27s 🟩
file_change_determinator 10s 🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

This comment has been minimized.

This comment has been minimized.

)
.await
.unwrap();
panic!("test_fault_tolerance_of_leader_equivocation");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The panic!() statement at the end of this test appears to be forcing a failure, which defeats the purpose of the test's actual validation logic through successful_criteria(). Since the test already has proper assertions, this line should be removed to allow the test to properly validate the leader equivocation fault tolerance behavior.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 19 to 47
impl NetworkLoadTest for PerformanceBenchmark {
async fn test(
&self,
swarm: Arc<tokio::sync::RwLock<Box<dyn Swarm>>>,
_report: &mut TestReport,
duration: Duration,
) -> Result<()> {
let validators = { swarm.read().await.get_validator_clients_with_names() };
let num_bad_leaders = validators.len() / 10;
for (index, (name, validator)) in validators.iter().enumerate().take(num_bad_leaders) {
validator
.set_failpoint(
"consensus::leader_equivocation".to_string(),
"off".to_string(),
)
.await
.map_err(|e| {
anyhow!(
"set_failpoint to set consensus leader equivocation on {} failed, {:?}",
name,
e
)
})?;
};
Ok(())
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test currently disables leader equivocation but returns immediately without waiting for the specified test duration. Adding tokio::time::sleep(duration).await before returning would allow the test to run for its full intended duration and properly observe the system behavior with the disabled failpoint.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

This comment has been minimized.

This comment has been minimized.

@danielxiangzl danielxiangzl added the CICD:build-failpoints-images Build failpoints docker image label Dec 13, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@danielxiangzl danielxiangzl force-pushed the daniel-baseline-clean branch 2 times, most recently from 76e5434 to 6cd0fc7 Compare December 13, 2024 23:37

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

❌ Forge suite realistic_env_max_load failure on failpoints_b9ff5f4344c14520f83bd49d53a97543c5e92a42

Load (TPS)                     | submitted/s  | committed/s  | expired/s    | rejected/s   | chain txn/s  | latency      | p50 lat      | p90 lat      | p99 lat      | mempool->block | prop->order  | order->commit | actual dur   | idx_fn       | idx_cache    | idx_data    
100                            | 99.92        | 99.92        | 0.00         | 0.00         | 100          | 1.266        | 0.900        | 2.000        | 2.100        | 0.000          | 0.000        | 0.000         | 78           | 0            | 0.000        | 0.000       
1000                           | 999.96       | 999.96       | 0.00         | 0.00         | 1003         | 1.091        | 1.000        | 1.200        | 2.400        | 0.000          | 0.000        | 0.000         | 78           | 0            | 0.000        | 0.000       
5000                           | 5000.00      | 5000.00      | 0.00         | 0.00         | 5022         | 1.258        | 1.200        | 1.400        | 1.600        | 0.000          | 0.000        | 0.000         | 78           | 0            | 0.000        | 0.000       
10000                          | 9999.91      | 9424.82      | 575.09       | 0.00         | 8209         | 7.691        | 1.800        | 32.400       | 54.000       | 0.458          | 0.333        | 0.382         | 78           | 0            | 0.000        | 0.000       
Test Failed: Failed latency check, for ["P50 latency for 100 is 0.9s and exceeds limit of 0.9s", "P90 latency for 100 is 2s and exceeds limit of 1.1s", "P99 latency for 100 is 2.1s and exceeds limit of 1.2s"]

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.89/src/error.rs:85:36
   1: aptos_forge::success_criteria::SuccessCriteriaChecker::check_latency
             at ./testsuite/forge/src/success_criteria.rs:564:13
   2: aptos_forge::success_criteria::SuccessCriteriaChecker::check_core_for_success
             at ./testsuite/forge/src/success_criteria.rs:267:9
   3: <aptos_testcases::load_vs_perf_benchmark::LoadVsPerfBenchmark as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/load_vs_perf_benchmark.rs:414:17
   4: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   5: <aptos_testcases::CompositeNetworkTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:639:37
   6: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   7: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:63
   8: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:107:5
   9: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:73:5
  10: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:31
  11: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/blocking.rs:66:9
  12: tokio::runtime::handle::Handle::block_on_inner::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:324:22
  13: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16
  14: tokio::runtime::handle::Handle::block_on_inner
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:323:9
  15: tokio::runtime::handle::Handle::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:302:18
  16: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:332:50
  17: forge::run_forge_with_changelog
             at ./testsuite/forge-cli/src/main.rs:426:24
  18: forge::main
             at ./testsuite/forge-cli/src/main.rs:329:21
  19: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  20: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  21: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
  22: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
  23: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  24: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  25: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  26: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  27: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  28: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  29: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  30: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  31: main
  32: __libc_start_main
  33: _start
Trailing Log Lines:
  28: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  29: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  30: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  31: main
  32: __libc_start_main
  33: _start

=== BEGIN JUNIT ===
<?xml version="1.0" encoding="UTF-8"?>
<testsuites name="forge" tests="1" failures="1" errors="0" uuid="2a14f3fb-7a9a-4b31-8e2e-5b9f575c90af">
    <testsuite name="local" tests="1" disabled="0" errors="0" failures="1">
        <testcase name="CompositeNetworkTest(CpuChaosWrapper(network:multi-region-network-emulation(load vs perf test))) with ">
            <failure message="Failed latency check, for [&quot;P50 latency for 100 is 0.9s and exceeds limit of 0.9s&quot;, &quot;P90 latency for 100 is 2s and exceeds limit of 1.1s&quot;, &quot;P99 latency for 100 is 2.1s and exceeds limit of 1.2s&quot;]

Stack backtrace:
   0: anyhow::error::&lt;impl anyhow::Error&gt;::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.89/src/error.rs:85:36
   1: aptos_forge::success_criteria::SuccessCriteriaChecker::check_latency
             at ./testsuite/forge/src/success_criteria.rs:564:13
   2: aptos_forge::success_criteria::SuccessCriteriaChecker::check_core_for_success
             at ./testsuite/forge/src/success_criteria.rs:267:9
   3: &lt;aptos_testcases::load_vs_perf_benchmark::LoadVsPerfBenchmark as aptos_forge::interface::network::NetworkTest&gt;::run::{{closure}}
             at ./testsuite/testcases/src/load_vs_perf_benchmark.rs:414:17
   4: &lt;core::pin::Pin&lt;P&gt; as core::future::future::Future&gt;::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   5: &lt;aptos_testcases::CompositeNetworkTest as aptos_forge::interface::network::NetworkTest&gt;::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:639:37
   6: &lt;core::pin::Pin&lt;P&gt; as core::future::future::Future&gt;::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   7: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:63
   8: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:107:5
   9: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/coop.rs:73:5
  10: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/park.rs:281:31
  11: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/blocking.rs:66:9
  12: tokio::runtime::handle::Handle::block_on_inner::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:324:22
  13: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/context/runtime.rs:65:16
  14: tokio::runtime::handle::Handle::block_on_inner
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:323:9
  15: tokio::runtime::handle::Handle::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/handle.rs:302:18
  16: aptos_forge::runner::Forge&lt;F&gt;::run
             at ./testsuite/forge/src/runner.rs:332:50
  17: forge::run_forge_with_changelog
             at ./testsuite/forge-cli/src/main.rs:426:24
  18: forge::main
             at ./testsuite/forge-cli/src/main.rs:329:21
  19: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  20: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  21: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
  22: core::ops::function::impls::&lt;impl core::ops::function::FnOnce&lt;A&gt; for &amp;F&gt;::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
  23: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  24: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  25: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  26: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  27: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  28: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  29: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  30: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  31: main
  32: __libc_start_main
  33: _start"/>
        </testcase>
    </testsuite>
</testsuites>
=== END JUNIT ===

Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:381"},"thread_name":"main","hostname":"forge-e2e-pr-15586-1734212469-failpoints-b9ff5f4344c14520f83bd4","timestamp":"2024-12-14T21:58:25.986575Z","message":"Deleting namespace forge-e2e-pr-15586: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:398"},"thread_name":"main","hostname":"forge-e2e-pr-15586-1734212469-failpoints-b9ff5f4344c14520f83bd4","timestamp":"2024-12-14T21:58:25.986597Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-15586"}
Failed to run tests:
Tests Failed

failures:
    CompositeNetworkTest

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Error: Tests Failed

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.89/src/error.rs:85:36
   1: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:358:13
   2: forge::run_forge_with_changelog
             at ./testsuite/forge-cli/src/main.rs:426:24
   3: forge::main
             at ./testsuite/forge-cli/src/main.rs:329:21
   4: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
   9: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  10: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  12: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  13: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  14: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  15: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  16: main
  17: __libc_start_main
  18: _start
Debugging output:
NAME                                   READY   STATUS      RESTARTS   AGE
aptos-node-0-fullnode-eforge15-0       1/1     Running     0          16m
aptos-node-0-validator-0               1/1     Running     0          16m
aptos-node-1-fullnode-eforge15-0       1/1     Running     0          16m
aptos-node-1-validator-0               1/1     Running     0          16m
aptos-node-2-fullnode-eforge15-0       1/1     Running     0          16m
aptos-node-2-validator-0               1/1     Running     0          16m
aptos-node-3-fullnode-eforge15-0       1/1     Running     0          16m
aptos-node-3-validator-0               1/1     Running     0          16m
aptos-node-4-fullnode-eforge15-0       1/1     Running     0          16m
aptos-node-4-validator-0               1/1     Running     0          16m
aptos-node-5-validator-0               1/1     Running     0          16m
aptos-node-6-validator-0               1/1     Running     0          16m
aptos-node-7-validator-0               1/1     Running     0          16m
aptos-node-8-validator-0               1/1     Running     0          16m
aptos-node-9-validator-0               1/1     Running     0          16m
forge-testnet-deployer-2pxkk           0/1     Completed   0          16m
genesis-aptos-genesis-eforge15-fcrks   0/1     Completed   0          16m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-failpoints-images Build failpoints docker image CICD:run-forge-e2e-perf Run the e2e perf forge only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants